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Abstract 

This paper considers the model problem of reconstructing an object from incomplete 
frequency samples. Consider a discrete-time signal / € C N and a randomly chosen set 
of frequencies f2 of mean size tN. Is it possible to reconstruct / from the partial 
knowledge of its Fourier coefficients on the set fi? 

A typical result of this paper is as follows: for each M > 0, suppose that / obeys 

#{«, f(t) ^ 0}<a(M)- (log N)- 1 

then with probability at least 1 — 0(N~ M ), f can be reconstructed exactly as the 
solution to the l\ minimization problem 

N-x 

min > |<?(i)|> s -t- = f(u>) for all u> £ ft. 

4=0 

In short, exact recovery may be obtained by solving a convex optimization problem. 
We give numerical values for a which depends on the desired probability of success; 
except for the logarithmic factor, the condition on the size of the support is sharp. 

The methodology extends to a variety of other setups and higher dimensions. For ex- 
ample, we show how one can reconstruct a piecewise constant (one or two-dimensional) 
object from incomplete frequency samples — provided that the number of jumps (dis- 
continuities) obeys the condition above — by minimizing other convex functionals such 
as the total- variation of /. 
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1 Introduction 



In many applications of practical interest, we often wish to reconstruct an object (a discrete 
signal, a discrete image, etc.) from incomplete Fourier samples. In a discrete setting, we 
may pose the problem as follows; let / be the Fourier transform of a discrete object /(£), 
teZ d N :={0,l,...,N-l} d , 

/» = £ f(t)e-^. 

tez% 

The problem is then to recover / from partial frequency information, namely, from f(to), 
where u> = (u!\, . . . ,Ud) belongs to some set fi of cardinality less than N d — the size of the 
discrete object. 

In this paper, we show that we can recover / exactly from observations /|n on small set 
of frequencies provided that / is sparse. The recovery consists of solving a straightforward 
optimization problem that finds /" of minimal complexity with /"(cj) = f(ui), \/uj G VL. 



1.1 A puzzling numerical experiment 

This idea is best motivated by an experiment with surprisingly positive results. Consider 
a simplified version of the classical 'tomography' problem in medical imaging: we wish to 
reconstruct a 2D image /(ti, ^2) from samples /|n of its discrete Fourier transform on a 
star-shaped domain Vt 4 . Our choice of domain is not contrived; many real imaging devices 
can collect high-resolution samples along radial lines at relatively few angles. Figure ^b) 
illustrates a typical case where one gathers 512 samples along each of 22 radial lines. 

Frequently discussed approaches in the literature of medical imaging for reconstructing an 
object from 'polar' frequency samples are the so-called filtered backprojection algorithms. 
In a nutshell, one assumes that the Fourier coefficients at all of the unobserved frequencies 
are zero (thus reconstructing the image of "minimal energy" under the observation con- 
straints). This strategy does not perform very well, and could hardly be used for medical 
diagnostic [T^]- The reconstructed image, shown in Figure ^c), has severe nonlocal ar- 
tifacts caused by the angular undersampling. A good reconstruction algorithm, it seems, 
would have to guess the values of the missing Fourier coefficients. In other words, one would 
need to interpolate f(u)i,u)2)- This is highly problematic, however; predictions of Fourier 
coefficients from their neighbors are very delicate, due to the global and highly oscillatory 
nature of the Fourier transform. Going back to our example, we can see the problem im- 
mediately. To recover frequency information near (ui, U2), where u% is near ±7T, we would 
need to interpolate / at the Nyquist rate 2ir/N. However, we only have samples at rate 
about 7r/22; the sampling rate is almost 50 times smaller than the Nyquist rate! 

We propose instead a strategy based on convex optimization. Let HffUsy De the total- 
variation norm of a two-dimensional object g which for discrete data g(ti,t 2 ), < ti,t 2 < 
N — 1, takes the form 

U\bv = V / \Dig(ti,t 2 )\ 2 + \D 2 g(t 1 ,t 2 )\ 2 , 
tuts 

where D\ is the finite difference Dig = g(t\,t2) —g(t\ — 1, t 2 ) and D 2 g = g(ti,t 2 ) —g(ti,t 2 — 
1). To recover / from partial Fourier samples, we find a solution f* to the optimization 
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(c) 



(d) 



Figure 1: Example of a simple recovery problem, (a) The Logan-Shepp phantom test 
image, (b) Sampling 'domain' in the frequency plane; Fourier coefficients are sampled along 
22 approximately radial lines, (c) Minimum energy reconstruction obtained by setting 
unobserved Fourier coefficients to zero, (d) Reconstruction obtained by minimizing the 
total- variation, as in (jl.ljl . The reconstruction is an exact replica of the image in (a). 
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problem 

min HpIIbv subject to g(u) = f(u>) for all uj G $7. (1-1) 

In a nutshell, given partial observation /|q, we seek a solution /" with minimum complexity — 
here Total Variation (TV) — and whose 'visible' coefficients match those of the unknown 
object /. Our hope here is to partially erase some of the artifacts classical reconstruction 
methods exhibit (which tend to have large TV norm) while maintaining fidelity to the 
observed data via the constraints on the Fourier coefficients of the reconstruction. 

When we use (jl.lj) for the recovery problem illustrated in Figure ^ (with the popular Logan- 
Shepp phantom as a test image), the results are surprising. The reconstruction is exact; 
that is, p = /! Now this numerical result is not special to this phantom. In fact, we 
performed a series of experiments of this type and obtained perfect reconstruction on many 
similar test phantoms. 



1.2 Main Results 

This paper is about a quantitative understanding of this very special phenomenon. For 
which classes of signals/images can we expect perfect reconstruction? What are the trade- 
offs between complexity and number of samples? In order to answer these questions, we 
first develop a fundamental mathematical understanding of a special one-dimensional model 
problem; we then exhibit reconstruction strategies which are shown to exactly reconstruct 
the unknown signal and can be deployed in many related and sophisticated reconstruction 
setups. 

For a signal / G C^, we define the classical discrete transform Fourier transform T '/ = / : 
C ^ -> by 

2nk 

f(k):=Y,f{t)e~ lulk \ u k = -£-,k = 0,l,...,N-l. (1.2) 
t=o 

If we are given the value of the Fourier coefficients f(k) for all frequencies k G Ztv, then 
one can obviously reconstruct / exactly via the Fourier inversion formula 

1 Ar ~ 1 

Now suppose that we are only given the Fourier coefficients /|n sampled in some partial 
subset SI C 1j N of all frequencies (here and below we abuse notations and identify the 
frequencies lo^ = 2irk/N with the corresponding integers whenever convenient). Of course, 
this is not enough information by itself to reconstruct / exactly, since / has N degrees of 
freedom and we are only specifying |fi| < N of those degrees (here and below |f2| denotes 
the cardinality of f2). 

Suppose, however, that we also specify that / is supported on a small (but a priori unknown) 
subset T of Ztv; that is, we assume that / can be written as a sparse superposition of spikes 

/ = $J a * <5 *' <5 '^') = 1 {*'=*}- 
teT 

If \T\ is small enough, we can recover / exactly: 
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Theorem 1.1 Suppose that the signal length N is a prime integer. Let Q be a subset of 
{0, ... ,N — 1}, and let f be a vector supported on T such that 

\t\ < -in 

2 

Then f can be reconstructed uniquely from £1 and f\^. Conversely, ifQ is not the set of all 
N frequencies, then there exist distinct vectors f,g such that |supp(/)|, |supp(g)| < + 1 
and such that /|n = g\n. 

Proof We will need the following lemma |18j . from which we see that with knowledge of 
T, we can reconstruct / uniquely (using linear algebra) from 

Lemma 1.2 fll$/ . Corollary 1.4) Let N be a prime integer and T,£l be subsets of1*N- 
Put £2(T) (resp. £2(^1)) to be the space of signals that are zero outside ofT (resp. Q.). The 
restricted Fourier transform J-t-+q, '■ ^(T) — ► £2^) is defined as 

?T->nf ■= /In for aHfel 2 (T), 

If\T\ = then J-t— >n is a bijection; as a consequence, we thus see that J-t-^u is injective 
for \T\ < |n| and surjective for \T\ > |f2|. Clearly, the same claims hold if the Fourier 
transform T is replaced by the inverse Fourier transform J 7 ^ 1 . 

To prove Theorem 11.11 we start with the former claim. Suppose for contradiction that 
there were two objects f,g such that /|^ = g\n and |supp(/)|, |supp(g)| < Then the 

Fourier transform of / — g vanishes on £1, and |supp(/ — g)\ < By Lemma ll .21 we see 
that ^suppCZ-g)--^ i s injective, and thus / — g = 0. The uniqueness claim follows. 

Now we prove the latter claim. Since < N, we can find disjoint subsets T, S of f2 such 
that \T\, \S\ < l\n\ + 1 and \T\ + \S\ = + 1. Let k be some frequency which does not 
lie in £1. Applying Lemma 1131 we have that ^Tus^Qulko} i s a bijection, and thus we can 
find a vector h supported onTu5 whose Fourier transform vanishes on Q but is non-zero 
on ko; in particular, h is not identically zero. The claim now follows by taking / := h\x 
and g := — h\$. ■ 

Note that if N is not prime, the lemma (and hence the theorem) fails, essentially because of 
the presence of non-trivial subgroups of Z^v with addition modulo N; see [0], |18| for further 
discussion. However, it is plausible to think that Lemma fl. 21 continues to hold for non-prime 
iV if T and Q are assumed to be generic - in particular, they are not subgroups of Z^r, or 
cosets of subgroups. If T and 17 are selected uniformly at random, then it is expected that 
the theorem holds with probability very close to one; one can indeed presumably quantify 
this statement by adapting the arguments given above but we will not do so here. However, 
we refer the reader to section 11.61 for a rapid presentation of informal arguments pointing 
out in this direction. 

A refinement of the argument in Theorem 11.11 shows that for fixed sets T, S, in Ztv, the 
space of vectors f,g supported on T, S such that f\a = g\n has dimension \T U S\ — |0| 
when \T U S\ > |f2|, and has dimension \T n S\ otherwise. In particular, if we let S(A^) 
denote those vectors whose support has size at most Nt, then set of the vectors in T,(Nt) 
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which cannot be reconstructed uniquely in this class from the Fourier coefficients sampled 
at Q, is contained in a finite union of linear spaces of dimension at most 2Nt — Since 
Yi(Nt) itself is a finite union of linear spaces of dimension Nt, we thus see that recovery of / 
from f\n is in principle possible generically whenever |supp(/)| = Nt < |f2|; once Nt > 
however, it is clear from simple degrees-of-freedom arguments that unique recovery is no 
longer possible. While our methods do not quite attain this theoretical upper bound for 
correct recovery, our numerical experiements suggest that they do come within a constant 
factor of this bound (see Figure [2J. 

Theorem ll.ll asserts that / can be reconstructed from /|n if \T\ < |fi|/2 (and that this bound 
is the best possible). In principle, we can recover / exactly by solving the combinatorial 
optimization problem 

(P ) min .\\g\Uo, g\n = f\ci, (1.4) 
geC N 

where \\g\\e is the number of nonzero terms g(t) 7^ 0}. Solving (|1.4j) directly is 
infeasible even for modest-sized signals. The algorithm would let T run over all subsets 
of {0,... ,N — 1} of cardinality \T\ < i|f2| and for each T, checking whether / was in 
the range of Tt^q. or not, and then inverting the relevant minor of the Fourier matrix to 
recover / once T was determined. It is well-known that this procedure would clearly be very 
computationally expensive, however, since there are exponentially many subsets to check; 
for instance, for |fi| ~ N/2, this number scales like 4^ • 3 -3A 7 4 i As an aside comment, note 
that it is not clear how to make this algorithm robust, especially since the results in |18j 
do not provide any effective lower bound on the determinant of the minors of the Fourier 
matrix, see section for a discussion of this point. 

A more computationally efficient strategy for recovering / from Q and /|q is to solve the 
convex problem 

(Pi) min \\g\U, := £ \g(t)\, g\ a = f\n- (1.5) 

a £ C tez N 

The key result in this paper is that the solutions to (Po) and (Pi) are equivalent for an 
overwhelming percentage of the choices for T and Q with \T\ < a ■ |0|/logiV (q > is a 
constant): in these cases, solving the convex problem (Pi) recovers f exactly. 

To establish this upper bound, we will assume that the observed Fourier coefficients are 
randomly sampled. To make this precise, we introduce a probability parameter < r < 1, 
and consider the sequence (Ik)i<k<N of independent Bernoulli random variables 

4 = /° withprob. 1-r, 
I 1 with prob. r. 

We then define the random set of frequencies Q as 

J) := {k : I k = 1}. (1.7) 

Clearly, |fi| follows the binomial distribution and 

B(\Q\)=tN. (1.8) 

In fact, classical large deviations arguments (or the central limit theorem) tell us that with 
high probability, the size of |0| is very close to tN. Our main theorem can now be stated 
as follows. 



6 



Theorem 1.3 Let f G be a discrete signal and be the random set defined in Q1.7|) . 
For a given accuracy parameter M , if f is supported on T and 



then with probability at least 1 — 0(N M ), the minimizer to the problem (|1.5|) is unique 
and is equal to f. 

In light of (jl.8|) we see that ()1.9j) is essentially |T| ~ |0|, modulo a constant and a logarith- 
mic factor. Indeed, an easy modification to the second part of Theorem 11.11 shows that the 
condition (|1.9|) cannot be weakened to (for instance) |supp(/)| < + e)rN, for any e > 0. 
The paper gives an explicit value of a(M), namely, a{M) x 1/[29.6(M + 1)] although we 
have not pursued the question of exactly what the optimal value might be. 

In Section [SJ we present numerical results which suggest that in practice, we can expect to 
recover / more than 50% of the time if \T\ < |f2|/4. For \T\ < \£l\/8, the recovery rate is 
above 90%. Empircally, the constants 1/4 and 1/8 do not seem to vary for N in the range 
of a few hundred to a few thousand. 

1.3 For Almost Every Q 

As the theorem suggests, there exist sets Q and functions / for which the i\ -minimization 
procedure does not recover / correctly, even if |supp(/)| is much smaller than \Q\. We 
sketch two counter-examples: 

• Dirac 's comb. Suppose that TV is a perfect square and consider the picket-fence signal 
which consists of spikes of unit height and with uniform spacing equal to y/N. This 
signal is often used as an extremal point for uncertainty principles [HI Ej as one of its 
remarkable properties is its invariance through the Fourier transform. Hence suppose 
that f2 is the set of all frequencies but the multiples of y/N, namely, |f2| = N — y/N. 
Then /|q = and obviously the reconstruction is identically zero. 

Note that the problem here does not really have anything to do with l\ -minimization 
per se; / cannot be reconstructed from its Fourier samples on 0, thereby showing that 
Theorem 11.11 does not work 'as is' for arbitrary sample sizes. 

• Box signals. The example above suggests that in some sense |T| must not be greater 



than about y |fi|. In fact, there exist more extreme examples. Assume the sample 




size N is large and consider for example the indicator function / of the interval 
T :={t: -AT- 001 < t < N om } and let be the set U := {k : N/3 <k< 2N/3}. Let 
h be a function whose Fourier transform h is a non-negative bump function adapted 
to the interval {k : -N/6 < k < N/6} which equals 1 when -N/12 < k < N/12. 
Then |/i(i)| 2 has Fourier transform vanishing in f2, and is rapidly decreasing away 
from t = 0; in particular we have |/i(i)| 2 = O(N~ 10 °) for t T. On the other hand, 
one easily computes that |/i(0)| 2 > c for some absolute constant c > 0. Because of 
this, the signal / — e\h\ 2 will have smaller £i-norm than / for e > sufficiently small 
(and N sufficiently large), while still having the same Fourier coefficients as / on S7. 
Thus in this case / is not the minimizer to the problem (Pi), despite the fact that 
the support of / is much smaller than that of 17. 



T\ < a(M) ■ (log N) 



■tN, 



(1.9) 
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The above counterexamples relied heavily on the special choice of O (and to a lesser extent 
of supp(/)); in particular, it needed the fact that the complement of O contained a large 
interval (or more generally, a long arithmetic progression). But for most sets Q, large 
arithmetic progressions in the complement do not exist, and the problem largely disappears. 
In short, Theorem 11.81 essentially says is that for most sets \T\ ~ |f2|, the inequality holds. 

1.4 Extensions 

As mentioned earlier, results on our model problem extend easily to higher dimensions as 
well as to other setups. To be concrete consider the problem of recovering a one-dimensional 
piecewise constant signal via 

mm $|n = /|n, (1.10) 

9 — * 

where we adopt the convention that g{— 1) = g(N — 1). In a nutshell, model (jl.5|) is 
obtained from (|l.l()jl after differentiation. Indeed, let 5 be the vector of first difference 
5(t) = g(t) — g[t — 1), and note that ^ 5{t) = 0. Obviously, 

S(u) = (1 - e~ iu; )g(u), for all u ^0 
and, therefore, with v(u>) = (1 — e~* w ) _1 , the problem is identical to 
min <5|n\{0} = (v/)ln\{o}> <*(°) = °> 

which is precisely what we have been studying. 

Corollary 1.4 Put T = {t, f(t) ^ f(t — 1)}. Under the assumptions of Theorem M.tA 
the minimizer to the problem (jl.l(J|) is unique and is equal f with probability at least 1 — 
0(N~ M ) — provided, of course, that f be adjusted so that ^ f(t) = /(0). 

We now explore versions of Theorem 11.31 in higher dimensions. To be concrete, consider 
the two-dimensional situation (statements in arbitrary dimensions are exactly of the same 
flavor) : 

Theorem 1.5 Put N = n 2 . We let f(t\,t2),l < £i,i2 < n be a discrete signal and O be 
the random set defined as in ()1.7|) . Assume that for a given accuracy parameter M , f is 
supported on T obeying (|1.9fl . Then with probability at least 1 — 0(N~ M ), the minimizer 
to the problem ()1.5j) is unique and is equal to f . 

We will not prove this result as the strategy is exactly parallel to that of Theorem 11.31 
Just as in the one-dimensional case, a similar statement for piecewise constant functions 
exists provided, of course, that the support of / be replaced by { (£i , £2) : \Dif(ti, ^2 ) 1 2 + 
\D 2 f{ti,t 2 )\ 2 + 0}. We omit the details. 

We hope that we managed to suggest that there actually are a variety of results similar to 
Theorem 11.31 and we only selected a few instances. As a matter of fact, those provide a 
precise quantitative understanding of the 'surprising result' discussed at the beginning of 
this paper. 



S 



1.5 Relationship to Uncertainty Principles 

From a certain point of view, our results are connected to the so-called uncertainty principles 
Ej which say that it is difficult to localize a signal / 6 both in time and frequency 
at the same time. Indeed, classical arguments show that / is the unique minimizer of (Pi) 
if and only if 

E \f(t) + h(t)\> J2 V/i/0, % = 

tez N tei N 

Put T = supp(/) and apply the triangle inequality 

E + w)\ = E i/o + + E \ h (f)\ > E i/wi - im*)i + E 

Hence, a sufficient condition to establish that / is our unique solution would be to show 
that 

E \Ht)\ <E V^^0,fc| n = 0. 

or equivalently \ h(t)\ < The connection with the uncertainty principle is now 

explicit; / is the unique minimizer if it is impossible to 'concentrate' half of the l\ norm 
of a signal that is missing frequency components in Q on a 'small' set T. For example, jS] 
guarantees exact reconstruction if 

2\T\ ■ (N- \n\) < N. 

Take < N/2, then that condition says that |T| must be zero which, of course, is far 
from being the content of Theorem 11.31 In truth, this paper does not follow this classical 
approach. Instead, we will use duality theory to study the solution of (-Pi). 

1.6 Robust Uncertainty Principles 

Underlying our analysis is a new notion of uncertainty principle which holds for almost 
any pair (supp(/), supp(/)). With T = supp(/) and £2 = supp(/), the classical discrete 
uncertainty principle [0] says that 

\t\ + \n\ > 2Vn. (i.ii) 

with equality obtained for signals such as the Dirac's comb. As we mentioned above, such 
extremal signals correspond to very special pairs (T, fi). However, for most choices of T 
and 0, the analysis presented in this paper shows that it is impossible to find / such that 
T = supp(/) and £1 = supp(/) unless 

\T\ + \n\ > 7(M) • (log Ny l/2 ■ N, (1.12) 

which is considerably stronger than (|1.11|) . Here, the statement 'most pairs' says again 
that the probability of selecting a random pair (T, O) violating (|1.12j) is at most 0(N~ ). 
(We are of course aware of numerical studies in jS] pointing out the lack of sharpness of the 
uncertainty principle when T is random.) 

In some sense, 1)1.12(1 is the typical uncertainty relation one can generally expect (as opposed 
to ()1. 11)1 ). hence, justifying the title of this paper. Because of space limitation, we are unable 
to belaborate on this fact and its implications any further, but will do so in a companion 
paper. 
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1.7 Connections with existing work 



The idea of relaxing a combinatorial problem into a convex problem is not new and goes 
back a long way. For example, El used the idea of minimizing l\ norms to recover 
spike trains. The motivation is that this makes available a host of computationally feasible 
procedures. For example, a convex problem of the type ljl.5j) can be practically solved using 
techniques of linear programming such as interior point methods [3]. 

Now, there exists some evidence that in special situations the unique solution to an i\ 
minimization problem coincides with that of the unique minimizer of the 0.$ problem. For 
example, a series of beautiful papers IE1 03 E3 El 1S concerned with a special setup where 
one is given a dictionary D of vectors (waveforms) of C^, D = {dk)\<k<M and one seeks 
sparse representations of a signal / 6 as a superposition of elements of D 



Suppose that the number of elements M from D is greater than the sample size N, then 
there are many ways in which one can represent / as a superposition of elements from D 
and one would want to find the 'sparsest' one. Consider the solution which minimizes the 
Iq norm of a subject to the constraint Q1.13JI and that which minimizes the l\ norm. A 
typical result of this body of work is as follows: suppose that s can be synthesized out of 
very few elements from D, then the solution to both problems are unique and are equal. 
We also refer to [191 !2Uj for very recent results along these lines. 

This literature certainly influenced our thinking in the sense it made us suspect that results 
such as Theorem II .31 were actually possible. However, we would like to emphasize that the 
claims presented in this paper are of a substantially different nature. We give essentially 
two reasons: 

• First, our model problem is different since we need to 'guess' a signal from incomplete 
data, as opposed to finding the sparsest expansion of a fully specified signal. 

• And second, our approach is decidedly probabilistic — as opposed to deterministic — 
and thus calls for very different techniques. For example, underlying our analysis are 
delicate estimates about the size of random matrices, which may be of independent 
interest. 

Besides the wonderful properties of £%, there is a second line of research connected to our 
findings. We can think of recovering a sparse superposition of spikes from an incomplete set 
of observations in the Fourier domain as a spectral estimation problem proviso swapping 
time and frequency: / is a superposition of a few complex sinusoids whose frequency and 
amplitude we need to determine from a few samples. From this point of view, our work is 
related to |1U1 II 1 j and j^J where the authors study sampling patterns allowing the exact 
reconstruction of a signal. These references show that the locations and amplitudes of a 
sequence of \T\ spikes can be recovered exactly from 2\T\ + 1 consecutive Fourier coefficients 
(in [25 f° r example, the recovery requires solving a system of equations and factoring a 
polynomial). Our results, namely, Theorems 11.11 and 11.31 are quite distinct and far more 
general since they address the radically different situation in which we do not have the 
freedom to choose the samples at our convenience. 



f = Da. 
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Finally, it is interesting to note that our results and the references above are also related 
to recent work j^2] in finding near-best P-term Fourier approximations (which is in some 
sense the dual to our recovery problem). The algorithm in |221 123j . which operates by 
estimating the frequencies present in the signal from a small number of randomly placed 
samples, produces with high probability an approximation in sublinear time with error 
within a constant of the best P-term approximation. First, in [23] the samples are again 
selected to be equispaced whereas we are not at liberty to choose the frequency samples at 
all since they are specified a priori. And second, we wish to produce as a result an entire 
signal or image of size N, so a sublinear algorithm is an impossibility. 



2 Strategy 



It is clear that at least one minimizer to (Pi) exists. On the other hand, it is not apparent 
why this minimizer should be unique, and why it should equal /. In this section, we 
outline our strategy for answering these questions. Using duality theory, we will be able 
to derive necessary and sufficient conditions for (Pi) to recover /. We note that a similar 
duality approach was independently developed in ^2] for finding sparse approximations 
from general dictionaries. 



2.1 Duality 

To get a feel for the line of argumentation, consider first the case where / is real-valued. 
Then 1)1.5(1 can be written as the linear program 

N-l 

min £> + (t) + <r(f)), Fn(g + - g~) = /|n (2.1) 

9 + ,9->0 

where g + (t) = max(<7(i), 0), g~(t) = — mm(g(t), 0), and the matrix Tq contains only the 
rows of the Fourier transform matrix corresponding to entries in Vt. The corresponding 
Lagrangian is 

N-l 

L(g + ,g-;X,fi + ,^) = £(0+ (i) + g~ (t)) + X H (f\ n - Fn(g + ~ g~)) + M + V + GOV 

t=o 

(2-2) 

with fi + , fx > 0. At a minimum (g + ,g ), there will be a saddle point in L, and we will 
have 

Fn(t-g-) = /|n 

(n+y~ g + = 

dL 



-^{g+(t)>0} - Fq\ + (i = 



9g + (t) 
dL 

= J {9-W>o} + + M~ = 0. 
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Then for / to be the minimum of ()2.1|) . we need 

sgn(/)(t) teT (2.3) 
t e T c (2.4) 

t £ T c (2.5) 

with > 0. In fact, for / to be the unique minimizer of (|2.1|) . it is necessary and 

sufficient for there to exist a A such that for P(t) = (J r if l \)(t), we have 

P(t) = sgn(/)(t) ter (2.6) 
|P(t)| < 1 t&T. (2.7) 

Thus, to show that /" is unique and is equal to /, it suffices to find a trigonometric 
polynomial P whose Fourier transform is supported in Q — in other words, which only uses 
frequencies in f2 — and which matches sgn(/) on supp(/), and has magnitude strictly less 
than 1 elsewhere. The following lemma generalizes for the case where / is complex-valued. 



l-(^A)(i)-/i + = 

i + (jp*A)m- M - = 



Lemma 2.1 Let Q C Zjv- For a vector f G C , define the 'sign' vector sgn(f) by 
sgn(f)(t) := f(t)/\f(t)\ when t £ supp(/) and sgn(f) = otherwise. Suppose there ex- 
ists a vector P whose Fourier transform P is supported in $7 such that 

P{t) = sgn(f)(t) for all t G supp(/) 

and 

\P(t)\ < 1 for alltgswpv(f). 

• Then «/^"supp(/)^r2 injective, the minimizer /" to the problem (Pi) (jl.5[) is unique 
and is equal to f. 

• Conversely, if f is the unique minimizer of (Pi), then there exists a vector P with the 
above properties. 

Proof We may assume that is non-empty and that / is non-zero since the claims are 
trivial otherwise. 

Suppose first that such a function P exists. Let g be any vector not equal to / with 
g\n = f\n- Write h := g — f , then h vanishes on £1. Observe that for any t £ supp(/) we 
have 

\g(t)\ = \f(t)+h(t)\ 

= \\f(t)\+h(t)sg n(f)(t)\ 
>\f(t)\+Re(h(t)sgn(f)(t)) 

= \f(t)\+Re(h(t)P(r)) 

while for t supp(/) we have \g(t)\ = \h(t)\ > Re(h(t)P(t)) since \P(t)\ < 1. Thus 

N-l 

IblUx > 11/IUx + Z; Re(Mt)^))- 

t=0 
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However, the Parseval's formula gives 



AT-l N-l 

£ Re(h(t)P(t)) = _£ Re(ft(fc)P(A)) = 

t=0 fc=0 

since P is supported on f2 and ft vanishes on S7. Thus HpH^ > H/H^- Now we check when 
equality can hold, i.e. when H^H^ = H/H^- An inspection of the above argument shows 
that this forces \h(t)\ = He(h(t)P(t)) for all t supp(/). Since |P(i)| < 1, this forces h to 
vanish outside of supp(/). Since h vanishes on f2, we thus see that h must vanish identically 
(this follows from the assumption about the injectivity of ^supp(/)->fi) an d so g = f. This 
shows that / is the unique minimizer f* to the problem 1)1. 5j) . 

Conversely, suppose that / = p is the unique minimizer to ()1.5|) . Without loss of gen- 
erality we may normalize \\f\\e 1 = 1. Then the closed unit ball B := {g : WgW^ < 1} 
and the affine space V := {g : g\n = f\n} intersect at exactly one point, namely /. 
By the Hahn-Banach theorem we can thus find a function P such that the hyperplane 
Y\ := {g : R- e (5(0 P(*)) = 1} contains V, and such that the half-space T<i := {g : 
J2Re(g(t) P(t)) < 1} contains B. By perturbing the hyperplane if necessary (and using 
the uniqueness of the intersection of B with V) we may assume that T\ Pi B is contained in 
the minimal facet of B which contains /, namely {g G B : supp(g) C supp(/)}. 

Since B lies in T<i, we see that sup t \P(t)\ < 1; since / E Ti n B, we have P(t) = sgn(/)(t) 
when t S supp(/). Since Ti n B is contained in the minimal facet of B containing /, we 
see that |-P(t)| < 1 when t supp(/). Since T\ contains V, we see from Parseval that P is 
supported in f2. The claim follows. ■ 

Since the space of functions with Fourier transform supported in f2 has |0| degrees of 
freedom, and the condition that P match sgn(/) on supp(/) requires |supp(/)| degrees 
of freedom, one now expects heuristically (if one ignores the open conditions that P has 
magnitude strictly less than 1 outside of supp(/)) that p should be unique and be equal 
to / whenever |supp(/)| <C in particular this gives an explicit procedure for recovering 
/ from fl and /|q. 



2.2 Architecture of the Argument 

Equipped with our duality theorem, we are now in a position to present the main ideas 
of the argument. Fix /. We may assume that tN > MlogiV since the claim is vacuous 
otherwise (as we will see, a(M) = 0(1/M) and thus (|1.9|) will force / = 0, at which point 
it is clear that the solution to (Pi) is equal to / = 0). 

We let T C Zjy denote the support of /, T := supp(/). Let O be the random set defined 
by (|1.7|) . Since tN > M log N, a typical application of the large deviation theorem shows 
that the cardinality of Q is if course close to that of its expected value, e.g. 

P(|fi| < E|0| - 1) < exp(-t 2 /2E|fi|). (2.8) 

Slightly more precise estimates are possible, see pQ. It then follows that 



P(|fi| < (1 - e M )\rN\) < N~ M , e M := (2-9) 
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In the sequel it will be convenient to denote by Bm the event < (1 — €m)\tN\}. 

In light of Lemma 12.11 it suffices — with probability 1 — 0(N~ M ) — to (1) show that 
the matrix ^suppC/)— >n nas full rank, and (2) construct a trigonometric polynomial P(t), 
< t < JV — 1, whose Fourier transform is supported on 17, matches sgn(/) on T, and has 
magnitude strictly less than 1 outside of T. To do this we shall need some auxiliary linear 
transformations (i.e. matrices) as we will see next. 

In this section, we will work with vectors restricted to the set T and it will be convenient 
to let ii(T) denote the subspace of such restrictions (and similarly I^^Ln) '■= C N ). With 
these notations, we let H : £ 2 (T) — » Ii^n) denote the linear transform defined by 

wen t'eTtt'^t 

Let i : £ 2 (T) — » l-i^ri) be the obvious embedding of £ 2 (T) into ti^fi) (extending by zero 
outside of T), and let t* : ti^Ljsj) — > ^ 2 (T) be the dual restriction map, thus t*/ := 
Observe that t*i : ^ 2 (T) — > £ 2 {T) is simply the identity operator on £ 2 (T), and that the 
operator l*H : £ 2 {T) -> ^ 2 (T) is self-adjoint. 

The key point is that the terms in (|2.10|) are rather oscillatory, since we have stripped out 
the non-oscillatory diagonal t = t'; indeed, the main idea of the argument will be to use 
the randomization of 0, to treat H as a "white noise" operator whose eventual effect will 
be negligible, especially if H is raised to a high power. 

To see the relevance of the operator H to our problem, observe that for all / 6 £ 2 {T) 

<• - j>/w = p E E «*^>/M = is, E /m 

11 11 weOt'eT 1 1 wen 

with f(co) the Fourier coefficient of / evaluated at the frequency uj. In particular, (t-r^rH)f 
has Fourier transform supported in Q. Next, suppose for the moment that the self-adjoint 
operator l*l — t^tL*H from £ 2 {T) to itself is invertible, and then set P(t), < t < N — 1, to 
be the trigonometric polynomial 

P := (i - - -L*lO-Vsgn(/). (2-11) 

Then by the preceding discussion: 

• Frequency support. P has Fourier transform supported in Q; 

• Spatial interpolation. P obeys 

o*P = - ^-XH)(c*t - ±-^H)-h* S gn(f) = t *sgn(/), 
and so P agrees with sgn(/) on T. 
Consider now the invertibility issue. By definition 

Hence, the invertibility of l*i — j^l*H implies that J^t—^q be injective. In summary, to 
prove the theorem it will suffice to show that: 
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• Invertibility. The operator l*l — j^l*H is invertible (with probability 1 — 0(N M ))- 

• Magnitude on T c . The function P defined in (|2.11|) obeys the bound sup te <p c |-P(i)| < 1 
(with probability 1 - <3(iV~ A/ ))- 



We first consider the former claim. 



3 Construction of the Dual Polynomial 

3.1 Invertibility 

We would like to establish invertibility of the matrix ^l*H with high probability. One 
obvious way to proceed would be to show that the operator norm or equivalently the largest 
eigenvalue of l*H is less than This is easily done if |supp(/)| is extremely small (e.g. 
much less than y^|fi|), simply by estimating the operator norm directly by the Frobenius 
norm || • \\p, which is easy to compute explicitly. Recall that for any squared matrix M, 
the Frobenius norm ||M||jr of M is defined by the formula 

||M Hi := Tr (MM*) = ^ \M(i, j)\ 2 , 

and obeys ||M|| < ||M||p. However, this simple approach does not work well when |supp(/)| 
is large, say equal to a • (logiV) -1 • |fl|. In this case, we have to resort to estimating the 
Frobenius norm of a large power of l*H, taking advantage of cancellations arising from the 
randomness of the matrix coefficients of L*H. 

We state the key estimate of this section. 



Theorem 3.1 Put Hq = l*H for short, where H is the operator defined by (|2.10|) . Set 
c T := elog((l — t)/t) and let 



n \rp\n+l 



a n = {2n-lf n c~^N\T\ 2 \ K= { ^ (-L-Y N^\T 

Then ^ 

E[Tv(Hl n )] <n | | max(a n , b n ). (3.1) 



In most interesting situations a n is less than b n which allows slightly to reformulate (|3.1|) , 
Note that the classical Stirling approximation to n! gives 



(2n)! 



2 n+l/2 e -n n n < 2 n+l n n 



n\2 n 

and, therefore, letting (f> be the 'golden ratio' <fi := (1 + \/5)/2, the 2rath moment obeys 

Id? 

E(Tr(# 2n )) < 2e- n 7 2n n n+1 • \rN\ n \T\ n+l , ^ 2 = —<—, (3.2) 

1 — r 
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provided that a n obeys 

a n < 2 n+1 e~ n n n N n \T\ n . (3.3) 

Theorem 13.11 gives a precise estimate about the operator norm of Hq. To see why this is 
true, assume that ((231) hol ds; since Hq is self- adjoint 

\\H \\ 2n = \\H£\\ 2 < \\H$\\ 2 F = Tr(F 2n ) 

and, therefore, 

(E||# ||) 2n < E||F || 2n < (2n) 7 2n e~ n n n \T\ n+1 \rN\ n . 
Now selecting n = [log |T|] so that 

e~ n n n \T\ < riog|T|] n 

gives 

E||i? || < 7 • \/log(|T|) • y/\T\ \tN\ • (1 + o(l)), as|T|^oo. 
Formalizing matters, we proved 

Corollary 3.2 Suppose \T\ < (log |riV|) |riV|. Then for any e > 0, we have 
P (j|#o|| > (1 + e) 7 Vlog jfj ^/\f\ pivf) -► as |T|, \tN\ -► oo. 

Proof The Markov inequality above bounds the probability by (1 + e)~ 2n which goes to 
zero as n = [log |T|] goes to infinity. ■ 

We now return to the study of the invertibility of l*l — mrHo. Letting a be a positive 
number < a < 1, it follows from the Markov inequality that 

Tfi 1 1 zrn 1 1 2 

P(\\m\\ F > a n ■ \TN\ n ) = „", oH £ . 
v " " 1 1 J a 2n \rN\ 2n 

We then apply inequality (|3.1|) (recall H-Hoil! = Tr (^o n )) an d obtain 



P(!|./:/;;!l,->n''.|rAT')< !2»)r- ( I^T (^S \r\. cu) 



We remark that the last inequality holds for any sample size \T\ (proviso the condition 
Q3.3|) ) and we now specialize (|3.4|) to selected values of \T\. 

Suppose that \T\ obeys 

\T\ < ^-1 < \T\ + 1, for some a M < a. (3.5) 
7^ n 

Then 

P(||flo \\f > a™ • IriVl") < 2(a 2 / 7 2 ) e~ n \tN\. 
We then have the following result. 
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Theorem 3.3 Assume that r < .44, say, and suppose that T obeys (|3.5|) . Then (|3.3|) holds 
for any n > 4, and therefore 

P(\\H$\\ F > a n ■ \rN\ n ) < 2(a/ 7 ) 2 e~ n \tN\. (3.6) 

The only thing to establish is that T obeys (|3.3|) . This is merely technical and the proof is 
in the Appendix. 

With the notations of the previous section and especially (|2.9|) , observe now that 

P(||ff„|| > a • < P(||flo|| > a (1 - e M )\rN\) + P(|0| < (1 - e M )\rN\), 

where we recall that Bm '■= {|^| < (1 — £m)\t~N\} has probability less than N~ M . Suppose 
T obeys (|3.5|) with au '■= ot(l — €m) instead of a, 

P(||ff || > a (1 - e M ) • \tN\) < 2(a/ 7 ) 2 e~ n \tN\. 



Corollary 3.4 Take n = (M + l) log N. We see from the Neumann series that the operator 
t*L— tqtL*H is invertible with probability at least 1— (l + 2/j 2 )N~ M since l*l is the identity 
on vectors supported on T . 

We have thus established the invertibility of l*l — -^l*H with high probability, and thus P 
is well defined with high probability. It remains to show that sup^y |P(i)| < 1 with high 
probability. 



3.2 Magnitude of the polynomial on the complement of T 

We first develop an expression for Pit) by making use of the algebraic identity 
(1 - M)~ l = (1 - M ri ) -1 (l + M + ... + M n ~ l ). 

Indeed, we can write 

so that the inverse is given by the truncated Neumann series 

- T^-XH)- 1 = + R) uL^* H ) m - ( 3 - 7 ) 

' ' m=0 

The point is that the remainder term R is quite small in the Frobenius norm: suppose that 
\\t-*H \\f < ol • then 

\\R F < — . 

In particular, the matrix coefficients of R are all individually less than a n /(l — a n ). Intro- 
duce the ^oo"frorm of a matrix as ||il<f Hqq — supy,^^^ IjiWxHoo which is also given by 

HMIloo =sup^|M(i,j)|. 

i 
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Now, it follows from the Cauchy-Schwarz inequality that 

\\MWl < sup #M(col) ]T|M(i,j)| 2 < #M(col) • ||M|||, 

i 

where # M(col) is of course the number of columns of M. This observation gives the crude 
estimate 

Piu^m 1 ^.- — -. (3.8) 

1 — a n 

As we shall soon see, the bound H3.8|) allows us to effectively neglect the R term in this 
formula; the only remaining difficulty will be to establish good bounds on the truncated 
Neumann series p--H"^m=o jn^( L * H ) m - 



3.3 Estimating the truncated Neumann series 

From we observe that on the complement of T 

P=±. H (L*L--^-L*Hrh*sga(f), 

since the t, component in (|2.11j) vanishes outside of T. Applying (|3,7|) . we may rewrite P as 

P(t) = P (t)+P 1 (t), \/t 6 T c , 

where 

P = S n sga(f), P x = -LHRi*(I + S„_i)sgn(/) 

and 

n 

s n = \n\- m (HL*) m . 

m=l 

Let ao, a\ > be two numbers with ao + ai = 1- Then 

P (sup \P(t)\ > 1 ) < P(||Po||oo > ao) + P(||Pl||oo > at), 

and the idea is to bound each term individually. Put Qq = S' n _isgn(/) so that Pi = 
j^HRt* (sgn(/) + Qq). With these notations, observe that 

II-PlIIoo < |^j-||-ff-R||oo(l + lk*Qo||oo)- 

Hence, bounds on the magnitude of Pi will follow from bounds on H-ffPHoo together with 
bounds on the magnitude of l*Qq. It will be of course sufficient to derive bounds on ||Qo||oo 
(since ||/-*<5o||oo < ||Qo||oo) which will follow from those on Pq since Qo is nearly equal to 
Po (they differ by only one very small term term). 

Fix t £ T c and write Po(t) as 

n 

P (t) = \n\- m X m (t), X m = (Hi*) m sgn(/) 

m=l 

The idea is to use moment estimates to control the size of each term X m (t). 



18 



Lemma 3.5 Set n = km. Then E|X m (io)| 2fc obeys the same estimate as that in Theorem 
\3.1\ (up to a multiplicative factor {T^ 1 ), namely, 

E\X m (t )\ 2k < -ntf n max(a n ,6 n ). (3.9) 

In particular, following ()3.2|) 

B\X m (t )\ 2k < 2e- n 7 2n n n+1 • \T\ n \rN\ n , (3.10) 

where 7 is as before. 

The proof of these moment estimates mimics that of Theorem 13. II and may be found in the 
Appendix. 

Lemma 3.6 Fix a$ = .91. Suppose that \T\ obeys (|3.5|) and let Bm be the set where 
\Q\ < (1 — cm) • \t~N\ with €m a $ in (|2.9j) . For each t £ ^n, there is a set At with the 
property 

P(A t ) > 1 - e n , e n = 2(1 - e M )~ 2n • n 2 e~ n a 2n ■ (0A2)- 2n , 

and 

\P Q (t)\ < .91, \Q (t)\ < .91 on A t D B C M . 

As a consequence, 

P(sup |P (i)| > a ) < N- M + Ne n , 

t 

and similarly for Qq. 

Proof We suppose that n is of the form n = 2 J — 1 (this property is not crucial and only 
simply simplifies our exposition). For each m and k such that km > n, it follows from ()3.5|l 
and (|3.1U|) together with some simple calculations that 

E\X m (t)\ 2k < 2ne~ n a 2n ■ \rN\ 2n . (3.11) 

Again |0| ~ \tN\ and we will develop a bound on the set where |0| > (1 — em)\tN\. 
On this set 

n 1 

\Po{t)\<^Y m , Y m = _ — — \X m {t)\. 



(1 - e M ) m \tN\ 

Fix (3j > 0, < j < J, such that J^jZo 2j Pj < «o- Obviously, 

n J-l 2J + 1 -1 J-l 2J +1 -1 

P(£^>«o)<£ £p ( y m >/?,)<£ E^ 2XjE ^ l2i " 



m I 



m=l j=0 m=2i j=0 m=2i 



where Kj = 2 J K Observe that for each m with 2 J : < m < 2 J+1 , Kjm obeys n < Kjm < 2n 
and, therefore, ()3.11|) gives 

E|Y m | 2 ^' < (1 - e M )" 2n • (2ne- n a 2n ). 
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For example, taking j to be constant for all j, i.e. equal to o n , gives 



P( £ Y m > oo) < 2(1 - £M )- 2n • n 2 e""a 2 " • /3 ~ 2 ", 



m=l 



with X^/=o ^/Jj < a o- Numerical calculations show that for (3q = .42, ^ ■ 2- ? /3j < .91 which 
gives 

n 

P(^ y m > .91) < 2(1 - e M )~ 2n • n 2 e~ n a 2n • (0.42)" 2n . (3.12) 

m=l 

The claim for Qq is, of course, identical and the lemma follows. ■ 
Lemma 3.7 Fix a\ = .09. Suppose that the pair (a, N) obeys \tN\ 3 / 2 j^ < ai/2. Then 

ll-Pllloo < a l 

on the event A n {[|t*H[|ir < a\Q\}, for some A obeying P(A) > 1 - 0(N~ M ). 

Proof As we observed before, (1) ||Pi||oo < ||-E^||oo||-R||oo(l + ||Qo||oo)> an d (2) Qq obeys 
the bound stated in Lemma 13.61 Consider then the event {||Qo||oo < !}• On this event, 
ll-Plll — a i if ]nrll-ff II ll-^lloo < ai/2. The matrix .ff obeys p|||-ff||oo < l^l since -ff has 
|T| columns and each matrix element is bounded by |Q| (note that far better bounds are 
possible). It then follows from (|3.8jl that 

I \H\ oo ' | L^l oo — |-^~1 ' ^ IT) 

I — a n 

with probability at least 1 — 0(N~ M ). We then simply need to choose a and n such that 
the right hand-side is less than a±/2. ■ 



3.4 Proof of Theorem PCTl 

It is now clear that we have assembled all the intermediate results to prove our theorem. 
Indeed, we proved the invertibility of i*i — j^l*H with probability 0(N~ M ) and \P(t)\ < 1 
for all t £ T c (again with high probability), provided that a and n be selected appropriately 
as we now explain. 

Fix M > 0. We choose a = .42 and n to be the nearest integer to (M + 1) logiV. 

1. From the discussion following Theorem l3.3l it follows that i*i — |f2| -1 t*ff is invertible 
with probability 0(N- M ). 



2. With this special choice, e n = 2[(M + 1) log iV] 2 • N ( M+1 ) and, therefore, Lemma 
implies that both P and Qq are bounded by .91 outside of T c with probability at 
least 1 - [1 + 2((M + 1) log N) 2 } ■ N~ M . 
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3. And finally, to prove that |-Pi(£)| < -09 outside T c , Lemma 13.61 assures that it is 
sufficient to have N 3 / 2 a n /(l - a n ) < .045. Because log(.42) « -.87 and log(.045) « 
—3.10, this condition is approximately equivalent to 

(1.5 - .87(M + 1)) log N < -3.10. 

Take M > 2, for example; then the above inequality is satisfied as soon as N > 17. 



To conclude, we proved that if T obeys 

|rJV| 



\T\ < a(M) 



loeJV" 



a(M) 



A2 2 



7 2 (M + 1 



■(l + o(l)) 



then the reconstruction with probability exceeding 1 — 0([(M + 1) log N) 2 ]- N M ). In other 
words, we may take a(M) in Theorem 11.31 to be of the form 



q(M) 



1 



29.6(M + 1 



■(1 + 0(1)). 



(3.13) 



4 Moments of Random Matrices 



4.1 A First Formula for the Expected Value of the Trace of (H ) 2n 
Recall that Ho(t,t'), t,t' £ T, is the \T\ x \T\ matrix whose entries are defined by 



tfo(M') 







t = t', 



*0 = J2 



A diagonal element of the 2nth power of Hq may be expressed as 

H% n (t 1 ,t 1 )= c{t 1 -t 2 )...c{t 2n -t 1 ), 

where we adopt the convention that i2n+l = *l whenever convenient and, therefore, 



(4.1) 



E(Tr(tf 2 ")) = E E 

tl,...,t2n' 



e <E^iWj(*j-*i+i) 

Wl,...,£J2n6n 



Using (|1.7j) and linearity of expectation, we can write this as 



^ e iE H Wj(lrl,+l) E 

tl,...,t2n-tjj^tj + i 0<UJl,...,LJ2n<N — l 



2n 

n 



The idea is to use the independence of the Ii u e n}' s to simplify this expression substantially; 
however, one has to be careful with the fact that some of the uSs may be the same, at 
which point one loses independence of those indicator variables. These difficulties require 
a certain amount of notation. We let Z^v = {0, 1, . . . , N — 1} be the set of all frequencies 
as before, and let A be the finite set A := {1, . . . , 2n}. For all u := (u)\, . . . , U2n), we define 
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the equivalence relation ~cj on A by saying that j ~cj f if an d only if ojj = ujy. We let 
V{A) be the set of all equivalence relations on A. Note that there is a partial ordering on 
the equivalence relations as one can say that ~ x <~ 2 if ~ x i s coarser than ~2, i.e. a ~2 b 
implies a ~i b for all a, b G A. Thus, the coarsest element in V^A) is the trivial equivalence 
relation in which all elements of A are equivalent (just one equivalence class), while the 
finest element is the equality relation =, i.e. each element of A belongs to a distinct class 
(|^4 1 equivalence classes). 

For each equivalence relation ~ in V, we can then define the sets fi(~) C Z^ by 

n(~) := {u G 1?$ :~^=~} 

and the sets f2<(~) C 1?£ by 

Thus the sets {H(~) :~G 7-*} form a partition of Z^J?. The sets f2<(~) can also be defined 
as 

0<(~) := {u G Zjy 1 : w a = w;, whenever a ~ 6}. 
For comparison, the sets ri(~) can be defined as 

fi(~) := {u G Z^ 1 : uj a = uj^ whenever a ~ b, and u; a 7^ cob whenever a 9^ 6}. 

We give an example: suppose n = 2 and fix ~ such that 1 ~ 4 and 2 ~ 3 (exactly 2 
equivalence classes); then S~i(~) := {uj G Z^ : uj\ = W2 = <^3) an d <^>i 7^ ^2} while 
£1<(~) := {w £ : wi = w 4 , u 2 = ^3}- 

Now, let us return to the computation of the expected value. Because the random variables 
Ik <|1-6|) are independent and have all the same distribution, the quantity E[f[ 
depends only on the equivalence relation and not on the value of u itself. Indeed, 
we have 

2n 

E(n^)=T |A/H 

3=1 



where Aj ~ denotes the equivalence classes of ~. Thus we can rewrite the preceding 
expression as 

E(Tr(F 2n )) = ^ Yj r ' A/ ~' Yl e iE ?=i^ (i ^^ +l) (4.2) 

where ~ ranges over all equivalence relations. 

We would like to pause here and consider (|4,2|) . Take n = 1, for example. There are only 
two equivalent classes on {1,2} and, therefore, the right hand-side is equal to 



e iu>i(ti-ti) _|_ r 2 e «o;i(ti-t2)+iw2(t2-ti) 



Our goal is to rewrite the expression inside the brackets so that the exclusion u)\ 7^ uji does 

72 



not appear any longer, i.e. we would like to rewrite the sum over u G Z^ : u\ 7^ ll>2 in 
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terms of sums over u) £ 1? N : uj\ = U2, and over u £ J? N . In this special case, this is quite 
easy as 

E = E - E 

The motivation is quite clear. Removing the exclusion allows to rewrite sums as product, 
e.g. 

^2 = ^2 eiu)i (ti ~ t2 ) • E eiW2 <yt2 ~ tl ) ' 

and each factor is equal to either N or depending on whether t\ = £2 or not. 

The next section generalizes these ideas and develop an identity, which allows us to rewrite 
sums over Jl(~) in terms of sums over f2<(~). 



4.2 Inclusion-Exclusion formulae 

Lemma 4.1 (Inclusion-Exclusion principle for equivalence classes) Let A and G 

be non-empty finite sets. For any equivalence class ~S T > {^) on u G G'" 4 ' , we have 

£ /(«)= £ (-1)1^1-1^-1 ( JJ |-l)l] £ /H. (4.3) 

Thus, for instance, if A = {1, 2, 3} and ~ is the equality relation, i.e. j ~ k if and only if 
j = k, this identity is saying that 

E = E 

a)l,a>2,k>3GG:cJi,<J2,k>3 distinct ui,iiJ2,U3£G 

- E - E - E + 2 E 

uji,uJ2 ,uj^:uji — LU2 uji,uJ2> uj 3£G:uJ2=uJ3 uji,lJ2j uj 3&G:uj^=uji uj\^l02^3^G:lo\=uJ2 =i -^3 

where we have omitted the summands f(u>i, 102,^3) for brevity. 

Proof By passing from A to the quotient space A/ ~ if necessary we may assume that ~ 
is the equality relation =. Now relabeling A as {1, . . . , n}, ~i as ~, and A' as A, it suffices 
to show that 

E /(«) = 

U£G n -.ui,...,u n distinct 



^2 l-E 1 .— »«>/~l 

^V({l,...,n}) 



n (i^i-i)i 

Ae{l,...,n}/~ 



E /m- ( 4 - 4 ) 

u;en<H 



We prove this by induction on n. When n = 1 both sides are equal to E^gg. /(a;). Now 
suppose inductively that n > 1 and the claim has already been proven for n — 1. We observe 
that the left-hand side of (|4.4j) can be rewritten as 



n-l 



£ £ /(u/,^)-^/^',^) 

w'eG"- 1 :^,...,^-! distinct \<*>«eG i=i 
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where u' := (lji, . . . ,u n ~i). Applying the inductive hypothesis, this can be written as 



Y Yl (\A'\-iy. 

^eP({l,...,n-l}) A'e{l,...,n-l}/~ 



E E/(-' 

Wen<(~') WeG 



E /(u/,^)) • (4.5) 

l<j<n 



Now we work on the right-hand side of Q4.4|) . If ~ is an equivalence class on {1, ... ,n}, 
let ~' be the restriction of~to{l,...,n — 1}. Observe that ~ can be formed from ~' 
either by adjoining the singleton set {n} as a new equivalence class (in which case we write 
~= {~', {n}}, or by choosing a j £ {1, . . . , n — 1} and declaring n to be equivalent to j (in 
which case we write ~= {~', {n}}/(j = n)). Note that the latter construction can recover 
the same equivalence class ~ in multiple ways if the equivalence class [j]^> of j in ~' has 
size larger than 1, however we can resolve this by weighting each j by jjjpTy- Thus we have 
the identity 



E 

~eP({l,...,n}) 



E nw.w}) 

'eP({l,...,n-l}) 



n-l 



" x 1 

+ E Ey7 f (KMl/(j' = n)) 

~'ev({i,...,n-i}) i=i IUJ ~ 1 



for any complex- valued function F on ^({l, . . . , n}). Applying this to the right-hand side 
of (|4.4jl . we see that we may rewrite this expression as the sum of 



'eP({l,...,n-l}) 



II (14-1)! 

Ae{l,...,n-l}/~' 



E f(u',u) n ) 

wen<(~') 



and 



n-l 



E (-lr-n 1 — 1 ^! E t (j) E /(^-)> 

~'eP({i,...,n-i}) j=i wen<(~') 

where we adopt the convention u/ = (wj, . . . , w n _i). But observe that 
1 



T{3) := 



n (1^-1)!= n cm 

Ae{l,...,n}/({~',{n}}/(j=»)) A'e{l,...,n-l}/~' 



and thus the right-hand side of 1)4.4(1 matches 1)4.5)) as desired. 



4.3 Stirling Numbers 

As emphasized earlier, our goal is to use our inclusion-exclusion formula to rewrite the sum 
1)4.2)1 as a sum over Q<(~). In order to do this, it is best to introduce another element of 
combinatorics, which will prove to be very useful. 
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For any n, k > 0, we define the Stirling number of the second kind S(n, k) to be the number 
of equivalence relations on a set of n elements which have exactly k equivalence classes, 
thus 

S(n,k) :=#{-£ V(A) : \Aj ~ | = k}. 

Thus for instance 5(0,0) = 5(1,1) = 5(2,1) = 5(2,2) = 1, 5(3,2) = 3, and so forth. We 
observe the basic recurrence 

5(n + 1, k) = 5(n, k - 1) + kS(n, k) for all k,n>0. (4.6) 

This simply reflects the fact that if a is an element of A and ~ is an equivalence relation 
on A with k equivalence classes, then either a is not equivalent to any other element of A 
(in which case ~ has k — 1 equivalence classes on j4\{a}), or a is equivalent to one of the 
k equivalence classes of 5\{a}. 

We now need an identity for the Stirling numbers 1 . 

Lemma 4.2 For any n > 1 and < r < 1/2, we have the identity 

n 00 kun-l 

J> - l)!5(n, k)(-ir~ k r k = ]T(-ir- fc -— _ (4.7) 

fc=l k=l ^ ' 

Note that the condition < r < 1/2 ensures that the right-hand side is convergent. 



Proof We prove this by induction on n. When n = 1 the left-hand side is equal to r, and 
the right-hand side is equal to 

k=l y ' k=0 v 7 t- 1 

as desired. Now suppose inductively that n > 1 and the claim has already been proven for 
n. Applying the operator (t 2 — t)-^ to both sides (which can be justified by the hypothesis 
< t < 1/2) we obtain (after some computation) 

n+l oo kin 

J> - l)!(5(n, k - 1) + fc5(n, A;))(-l)" +1 - fc r fc = ^(-l)^i-*^L_ i 

k=l k=0 ^ ' 

and the claim follows from (|4.6|) . ■ 
We shall refer to the quantity in ()4.7j) as F n (r), thus 

F n (r) = J> - l)!5(n, fc)(-l)» ~ k r k = (4.8) 
fc=i fe=i ^ T ^ 

Thus we have 

Fi(t)=t, F 2 (r) = -r + r 2 , F 3 (r) = r - 3r 2 + 2r 3 , 

and so forth. When r is small we have the approximation F n (r) ~ (— l) ra+1 T, which is 
worth keeping in mind. Some more rigorous bounds in this spirit are as follows. 

1 We found this identity by modifying a standard generating function identity for the Stirling numbers 
which involved the polylogarithm. It can also be obtained from the formula S(n, k) = y X^i=o 1 ( — 1 )*(i) ( k ~ 
i) n , which can be verified inductively from 14.611 . 
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Lemma 4.3 Let n > 1 and < r < 1/2. If < e 1 n , then we have \F n (r)\ < j^p. If 
instead -r— > e 1_n , then 

1 — T 7 

1 - T 

-FnMI < exp((n - l)(log(n - 1) - log log 1)). 

r 

Proof Elementary calculus shows that for x > 0, the function g(x) = 7jz7pr is increasing 
for x < x* and decreasing for x > x*, where x* := (n — l)/log If T3= < e 1_n , then 
< 1) and so the alternating series F n (r) = Y^/k=i(~ l) n+fc <?(&) has magnitude at most 
= j^-. Otherwise the series has magnitude at most 

1 - r 

g(x*) = exp((re - l)(log(n - 1) - log log 1)) 

r 

and the claim follows. ■ 

Roughly speaking, this means that F n (r) behaves like r for n = 0(log[l/r]) and behaves 
like (n/log[l/r]) n for n > log[l/r]. 



4.4 A Second Formula for the Expected Value of the Trace of H ( 

Let us return to (|4,2j) . The inner sum of Q4.2|) can be rewritten as 

E - lAM E /(«) 

~eP(A) 

with f(u) := e J Si<j<2n^(*j-*j+i). We prove the following useful identity: 



Lemma 4.4 



E - |A/ ~ I E /<«> = E 



E /(«) 

a;en<(~i) 



*U'|(r). (4.9) 

A'eA/~i 



Proof Applying (|4.3|) and rearranging, we may rewrite this as 

E e /(«). 

~ieP(A) wef2<(~i) 

where 

TK)= E t^/H^IA/H-IA/^I JT (|A'/~|-1)!. 
~eP(A):~>~i A'eA/~i 
Splitting A into equivalence classes A' of A/ ~i, we observe that 

TK)= [] E t\ a 'I~X-\)\ a '/~'\-\ a \\A'/ ~>\-\)\; 
A'eA/~i ~'e7>(A') 

splitting ~' based on the number of equivalence classes \A' / ~' |, we can write this as 

|A'| 

J] ^5(|A / |,fc)T*(-l)l^-*( fc _i) I= J] *j A ,,( r ) 
A'eA/~i fc=l A'eA/~ a 
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by Q4.8JI . Gathering all this together, we have proven the identity (|4.9|) . ■ 

We specialize ()4.9|) to the function /(cj) := exp(i Xa<j<2n ~~ anc ^ obtain 

E[Tr(iT 2 ")] = E E E e iE ^^ fe ^ +l) II F l^l( r )- 

~eP(A) ti,...,taneT:tj^% + i uen<(~) A'eA/~ 

We now compute 

7(~) = e i Ei< 3 <2n w j(*j-*j+i)_ 
u>e%(~) 

For every equivalence class A' G A/ ~, let t^/ denote the expression i^/ := ^ a£j 4'(ia — ia+i), 
and let u^i denote the expression := uj a for any a 6 A' (these are all equal since 
w£fi<H). Then 



We now see the importance of ()4.1Uj) as the inner sum equals |Zjy| = N when tj^i = and 
vanishes otherwise. Hence, we proved the following: 



Lemma 4.5 For every equivalence class A' 6 Aj ~, let tj^i := X^aeA'(^ — ^a+i)- ^ 



E[Tr(^)]= E E ^ H II ( 4 - n ) 

~EV(A) t£T 2n :tj^=tj +1 and t A ,=Q for all A' A'^A/~ 

This formula will serve as a basis for all of our estimates. In particular, because of the 
constraint tj ^ tj+i, we see that the summand vanishes if Aj ~ contains any singleton 
equivalence classes. This means, in passing, that the only equivalence classes which con- 
tribute to the sum obey \Aj ~ | < n. 



4.5 A First Bound on E[Tr(# 2n )] 

Let ~ be an equivalence which does not contain any singleton. Then the following inequality 
holds 

#{te T 2n : t A , = for all A' 6i/~}< | r |2n-[A/~[+l_ 

To see why this is true, observe that as linear combinations of t\, . . . ,t2 n , the expressions 
tj — tj + i are all linearly independent of each other except for the constraint Y^j=i = 
0. Thus we have \A/ ~ | — 1 independent constraints in the above sum, and so the number 
of i's obeying the constraints is bounded by | T\ 2n ~ 

All the equivalence classes in the sum ()4.11(l are without singletons as otherwise tjy 7^ 0. 
Thus, for n,k > 0, we let P(n, k) be the number of equivalence classes on a set of n elements 
which have exactly k equivalence classes and no singletons 

P(n, k) : = # {~G V{A) : \A/ ~ | = k and \A'\ > 2, VA' G A/ ~}. 

There is a simple recursion on these numbers, namely, 

P(n, k) = P(n - 1, k) + (n - l)P(n — 2,k — 1), (4.12) 
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which is valid for all n, k > 0. This simply reflects the fact that if a is an element of A 
and ~ is an equivalence relation on A with k equivalence classes, then either (1) a belongs 
to a class which has only one other element (3 of A (in which case ~ has k — 1 equivalence 
classes and no singleton on ^4\{a, /?}), or a is equivalent to one of the k equivalence classes 
of A\{a}, each of which having at least two elements. 

With these notations, we established 

n 

ETr(H^) <^2N k \T\ 2n ~ k+1 P(2n,k) sup ] [ F ]a , } (t). (4.13) 

k=l ~:\A/~\=k A'£A/~ 

The following lemma provides an upper bound on those P(n, fe)'s. 
Lemma 4.6 The numbers P(n, k) obey 

P(n,k) < X n (n-l)...(n-2k + l), VA > <f> := 1 + . (4.14) 

Proof The proof operates by induction. The bound ()4.14j) is obvious for n = 1. Suppose 
the claim is established for all pairs (m, k) with m < n. We will show that this implies the 
property for m = n + 1. Indeed, 

P(n + l,k) = P(n-l,k) + (n-l)P(n-2,k-l) 

< A™" 1 (n - 2) . . . (n - 2k + 2) + A n ~ 2 (n - 1) . . . {n - 2k + 1) 

< (A n - 1 + A n - 2 )(n-l)...(n-2fc + l). 

The claim follows since for A > we have A™" 1 + A n ~ 2 < X n . ■ 

This lemma gives us an idea of how large the P(2n, fc)'s appearing in the sum (|4.13|) really 
are. To derive an upper bound on the whole sum, we also need to understand the behavior 
°f n^'eA/^ F\A'\( T )- This is the subject of our next section. 

4.6 Convex analysis 

We start with a useful and classical lemma. 

Lemma 4.7 Let f be a convex function on [0, 1], say. Consider the problem 

k k 
f* = max f(xj), subject to Xj > and = 1- (4-15) 

i=i i=i 

Then the maximum value f* is obtained by allocating one Xj to 1 and all the others to 0, 
i.e. /* = (fc-l)/(0) + /(l). 

Proof For each Xj, < xj < 1, the convexity of / implies 

f(xj) < {l- Xj )f{Q)+ Xj f{l). 
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Summing this inequality over all indices gives 

k 

£/(x;)<(*-l)/(0) + /(l), 

3=1 

which is what we sought to establish. ■ 
Corollary 4.8 Suppose that f = log-F is a convex function on [0, 1], say, and consider 

k k 

F* = max F(xj), subject to Xj > and = 1- (4-16) 

3=1 3=1 

Then the maximum value F* is obtained by allocating one Xj to 1 and all the others to 0, 
i.e. F* = (F(0)) fc - 1 F(1). 

Proof Take the logarithm of Ylj = i F{xj) and apply Lemma 14.71 ■ 

Note that both the lemma and the corollary hold for 'discrete' functions; that is, suppose 
that f(j) obeys 

fti + 1) - f{j) > M - fti - 1), 3 = 0,1,2,.... (4.17) 

Then the maximum value of Y2j=i f( n j) where the rij's are now integer values obeying 
rij > and Ylj=i n j = ^ is of course achieved by taking all the n^'s equal to zero but one 
equal to n. 

With these preliminaries in place, recall now the bound obtained in Lemma 14.31 

F n (r) <G T/(1 _ T) (n) 

where 

G u (n) = h logu<l-n, 
I exp((n — l)(log(n — 1) — loglog(l/n) — 1)), logn > 1 — n. 

Note that we voluntarily exchanged the subscripts, namely, r and n to reflect the idea that 
we shall view G as a function of n while r will serve as a parameter. It is clear that log G 
is convex and, therefore, 

k k 
G* u = max G u (rij), subject to rij > 2 and ^^^j = 2n 

3=1 3=1 

obeys 

G* u = (G u (2)) k - 1 G u (2n-2k + 2). 

Set G = Gy/n_ T ) for short. Then for any equivalence class such that \Aj ~ | = k, the 
above argument yields 

]J F\A'\(r)< [J G(\A'\)<[G(2)] k - 1 G(2n-2k + 2), 

A'eA/~ A'&A/~ 
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which, on the one hand, gives 

n 

ETr(tf 2n ) < ^ Nk \T\ 2n ~ k+1 P(2n, k) [G{2)] k - 1 G(2n - 2k + 2). 

k=l 

On the other hand, P(2n, k) < <p 2n (2n — 1) . . . (2n — 2k + 1) (see Lemma 14 .6(1 and, therefore, 

n 

ETr(tf 2n ) < |T| \ 2n f( k ), (4-19) 

k=l 

where 

f(k) := N k \T\ 2n ~ k [(2n - 1) . . . (2» - 2k + 1)] [G(2)] fc ~ 1 G(2n -2k + 2) (4.20) 
We prove that the summand / is in some sense convex. 

Lemma 4.9 For each k < n — 1, f obeys 

f(k + l)-f{k)>f{k)-f{k-l). 

As a consequence of this lemma, the maximum of f(k), 1 < k < n is of course attained at 
either the left-end point (k = 1) or the right-end point {k = n); in short, 

f(k) <max(/(l),/(n)), VI < k < n. 

Proof We need to establish that for each 1 < k < n — 1, 

/(fc + 1) /(fc-l) 
/(*) /(*) " ' 

Observe that 

+ f(k) i 

with a = NG(2)/\T\ and 

G(2n - 2fc) _ 1 G(2n - 2k + 4) 

- (2ra - 2fc - 1) • G(2n _ 2A; + 2)5 Pfc-l- { 2 n -2k + l) ' G(2n - 2k + 2) ' 

Clearly 

ap fc+ i + a _1 p fc _i > 2 
and, therefore, it is sufficient to establish that pfc+ 1 Pk—i > 1- Put m = n — k, then 

m — 1 G(m) G(m + 4) 



Pk+l Pk-1 



m + 1 [G(m + 2)] 2 

m - 1 (m - l) m_1 (m + 3) m+3 



m + 1 (m + l) 2 ^ 1 ) 

It is now a simple exercise to check that for each m > 2, the logarithm of the right-hand is 
nonnegative, i.e. 

m log(m — 1) + (m + 3) log(m + 3) — (2m + 3) log(m + 1) > 0. 

We omit the proof of this fact. ■ 
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4.7 Proof of Theorem EO 



The previous section established 



ETr(tf 2 ™) < \T\ cp 2n -n- max(/(l), f(n)) 



where letting c T := elog((l — t)/t) 



/(l) = (2n - 1) G(2n) N \T 



2n-l 



(2n- l) 2n a 



c -(2n-l) ^ | T 



2n-l 



and 




n 



/(n) = [(2n - 1) x (2n - 3) 



iV n |T 



This is exactly the content of Theorem 13.11 

5 Numerical Experiments 

In this section, we present numerical experiments in order to derive empirical bounds on 
\T\ relative to |0| for a signal / supported on T to be the unique minimizer of (.Pi). The 
results can be viewed as a set of practical guidelines for situations where one can expect 
perfect recovery from partial Fourier information using convex optimization. 

Our experiments are of the following form: 

1. Choose constants N (the length of the signal), N (the number of spikes in the signal), 
and N u (the number of observed frequencies). 

2. Randomly generate the subdomain T by sampling {0, . . . , N — 1} Nt times without 
replacement (we have |T| = Nt). 

3. Randomly generate / by setting f(t) = 0,t 6 T c and drawing both the real and 
imaginary parts of f(t),t £ T from independent Gaussian distributions with mean 
zero and variance one 2 . 

4. Randomly generate the subdomain f2 of observed frequencies by again sampling 
{0, . . . , N — 1} Nu times without replacement (|f2| = N^). 

5. Solve (.Pi), and compare the solution to /. 

The ^i-norm is not strictly convex, so solving (Pi) using a Newton- type method that 
relies on local quadratic approximations of || • ||^ a is problematic. Instead, we use a very 
simple gradient descent with projection algorithm. The number of iterations needed for 
convergence is high (on the order of 10 5 ), but since we can rapidly project onto the constraint 
set (using two fast Fourier transforms), each iteration takes a short amount of time. As an 
indication, the algorithm typically converges in less than 10 seconds on a standard desktop 
computer for signals of length N = 1024. 

2 The results here, as in the rest of the paper, seem to rely only on the sets T and fi. The actual values 
that / takes on T can be arbitrary; choosing them to be random emphasizes this. Figures [5] remain the 
same if we take fit) — 1, t g T, say. 
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(a) 



(b) 



Figure 2: Recovery experiment for N = 512. (a) The image intensity represents the per- 
centage of the time solving (Pi) recovered the signal / exactly as a function of |fi| (vertical 
axis) and |T|/|J)| (horizontal axis); in white regions, the signal is recovered approximately 
100% of the time, in black regions, the signal is never recovered. For each |T|, |f2| pair, 100 
experiments were run. (b) Cross-section of the image in (a) at = 64. We can see that 
we have perfect recovery with very high probability for \T\ < 16. 

Figure [2 illustrates the recovery rate for varying values of \T\ and for N = 512. From 
the plot, we can see that for |0| > 32, if \T\ < |0|/5, we recover / perfectly about 80% 
of the time. For \T\ < |f2|/8, the recovery rate is practically 100%. We remark that these 
numerical results are consistent with earlier findings |2j. 

One source of slack in the theoretical analysis is the way in which we choose the polynomial 
P(t) (as in (|2.11|l ). Theorem 12.11 states that / is a minimizer of (Pi) if and only if there 
exists any trigonometric polynomial that has P(t) = sgn(f)(t),t G T and |P(i)| < l,t G T c . 
In ([2.11)1 we choose P(t) that minimizes the £2 norm on T c under the linear constraints 
P(t) = sgn(/)(i),i G T. However, the condition \P(t)\ < 1 suggests that a minimal 
choice would be more appropriate (but is seemingly intractable analytically). 

Figure 01 illustrates how often the sufficient condition of P(i) chosen as (|2.11|) meets the 
constraint |P(t)| < l,t G T c for the same values of r and \T\. The empirical bound on T is 
stronger by about a factor of two; for \T\ < |$7|/10, the success rate is very close to 100%. 

As a final example of the effectiveness of this recovery framework, we show two more 
results of the type presented in Section 11.11 piecewise constant phantoms reconstructed 
from Fourier samples on a star. The phantoms, along with the minimum energy and 
minimum total- variation reconstructions (which are exact), are shown in Figure |1J Note 
that the total-variation reconstruction is able to recover very subtle image features; for 
example, both the short and skinny ellipse in the upper right hand corner of Figure Etd) 
and the very faint ellipse in the bottom center are preserved. (We invite the reader to check 
jl] for related types of experiments.) 
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Figure 3: Sufficient condition test for iV = 512. (a) The image intensity represents the 
percentage of the time P(t) chosen as in (|2.11[) meets the condition \P(t)\ < l,t € T c . (b) 
A cross-section of the image in (a) at \Q\ = 64. Note that the axes are scaled differently 
than in Figure El 

6 Discussion 

We would like to close this paper by offering a few comments about the results obtained in 
this paper and by discussing the possibility of generalizations and extensions. 



In the introduction section, we argued that even if one knew the support T of /, the 
reconstruction might be unstable. Indeed with knowledge of T, a reasonable strategy might 
be to recover / by the method of least-squares, namely, 



In practice, the matrix inversion might be problematic. Now observe that with the notations 
of this paper 



Hence, for stability we would need my -Ho <1 — S for some 5 > 0. This is of course exactly 
the problem we studied, compare Theorem 13.31 In fact, selecting a a/ as suggested in the 
proof of our main theorem (see section I3.4j) gives w-Ho — -42 with probability at least 

1 - 0(N~ M ). This shows that selecting \T\ as to obey (JT3|) . \T\ & |Q|/logJV actually 
provides stability. 



6.1 Stability 



/ = {^T^n^T^n) Fr-*n / In- 



^r^n^T^n oz It — tt^Hq- 
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(a) (b) (c) 




(d) (e) (f) 

Figure 4: Two more phantom examples for the recovery problem discussed in Section 11.11 
On the left is the original phantom ((d) was created by drawing ten ellipses at random), 
in the center is the minimum energy reconstruction, and on the right is the minimum 
total-variation reconstruction. The minimum total-variation reconstructions are exact. 
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6.2 Robustness 



An important question concerns the robustness of the reconstruction procedure vis a vis 
measurement errors. For example, we might want to consider the model problem which 
says that instead of observing the Fourier coefficients of /, one is given those of / + h where 
h is some small perturbation. Then one might still want to reconstruct / via 

f = argmin \\g\\ tl , g(u) = f(u) + h(uj), VwGfi. 

In this setup, of course, one cannot expect exact recovery. Instead, one would like to know 
whether or not our reconstruction strategy is well-behaved or more precisely, how far is 
the minimizer /" from the true object /. In short, what is the typical size of the error? 
Our preliminary calculations suggest that the reconstruction is robust in the sense that the 
error ||/ — is small for small perturbations h obeying \\h\\i < 5, say. We hope to be 
able to report on these early findings in a follow-up paper. 



6.3 Extensions 

Finally, work in progress shows that similar exact reconstruction phenomena hold for other 
synthesis/measurement pairs. Suppose one is given a pair of of bases (Bi, B2) and randomly 
selected coefficients of an object / in one basis, say £>2- (From this broader viewpoint, the 
special cases discussed in this paper assume that B\ is the canonical basis of M. N or W N x M. N 
(spikes in ID, 2D), or is the basis of Heavysides as in the Total- variation reconstructions, 
and B2 is the standard ID, 2D Fourier basis.) Then, it seems that / can be recovered 
exactly provided that it may be synthesized as a sparse superposition of elements in B\. 
The relationship between the number of nonzero terms in B\ and the number of observed 
coefficients depends upon the incoherence between the two bases (Zj. The more incoherent, 
the fewer coefficients needed. Again, we hope to report on such extensions in a separate 
publication. 



7 Appendix 

7. 1 Proof of Theorem HOI 

We need to prove that for r < .44 and n > 4, 

(2n - l) 2n (c r )~ (2n ~ 1} N \T\ 2n - 1 < n n 2 n+1 e~ n (j—^j ^ \ T \ n ^ 

Now (2n — l) 2n = (2n) 2n e~ 1 e n where e n < e 1//2ra , say. We may then rewrite the previous 
inequality as 

e n (2e) n - 1 n n (c T )- 2 ^-^ (1 - r)™" 1 ^!™ -1 < N^ 1 s T 
where s T = -^—c T . Because \T\ < s|ll^l < a < 1, it is sufficient to check that 

2a 2 e(l-r) 
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n = 4 n = 5 

Figure 5: Behavior of the left and right-hand side of (|7.1|) for two values of n 



Note that plugging the value of 7 gives r T = (1 — r) 3 /(ea 2 ^ 2 [log(i^)] ) (recall = 
(l + \/5)/2). In other words, we want 

(n - 1) log r T + logra + — < logs T . (7.1) 

in 

Figure |5] illustrates the behavior of both the left-hand side and the right-hand side with 
a = 1. Simple numerical calculations show that with a = 1, (|7.1|) holds for r < .44 and 
re > 4, as claimed. 

7.2 Proof of Lemma 13.51 

Set e 1 ^ = sgn(/) for and fix K. Using (|2.1U|) . we have 

ti,...,t„ + i£T:tj^tj + i for j=0,...,n uo,...,ii;„gn 

and, for example, 



\[(Ht*) n+1 e^](t )\ 2 = ^ jtttn+i) e -i4>(f n+1 ) 

tl,...,t n _(_l GT: tj^tj^_i for j — 0,. .. ,n 

t'...,t' 6T:t'#('-i, for 3=0,. ...Ti 
1' 2n j+1 

e iE7=o^(*i-*i+i) e -iE"=o^(*;-^+i). 

One can calculate the 2Kth moment in a similar fashion. Put m := K(n + 1) and 
u := (uf)^, t = (tf ] ) Kj E T 2Ki ~ n+l \ 1 < j < n + 1 and 1 < k < 2K 
With these notations, we have 
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(k) 

where we adopted the convention that x =2:0 f° r all 1 < /c < 2K and where it is 

(k) (k) 

understood that the condition t - 7^ t)+± is valid for < j < n. 

Now the calculation of the expectation goes exactly as in section 0J Indeed, we define 
an equivalence relation on the finite set A := {0, ...,n} x {1,...,2K} by setting 
(j, k) ~ (j ; , k') if = u}y ^ and observe as before that 



E 



II'. 



,(fe) 
3 



T \AH. 



that is, r raised at the power that equals the number of distinct u/s and, therefore, we can 
write the expected value m(n; K) as 



m(n;K)= £ e'E^C-l)^) £ r W~l 

J ' J + l 

E 

u>en(~) 



As before, we follow Lemma 14.51 and rearrange this as 

m(n-K)= ^ E e^lU-^l) jj ^ /](r) 

As before, the summation over cj will vanish unless tj^ '■= k)€A'(~^) k (^ ~ tj+i) = 
for all equivalence classes A' S Aj ~, in which case the sum equals In particular, 

if A/ ~, the sum vanishes because of the constraint tj 7^ ij+i, so we may just as well 
restrict the summation to those equivalence classes that contain no singletons. In particular 
we have 

\A/~\<K(n + l) = m. (7.2) 

To summarize 

m (n,K) = E e «EJlH)'«C)^H [] F m (r) 

~eV(A) teT^:tf>&<% and t A ,=0 for all A' A'eA/~ 

< e E NlAM II *U'iM, (7.3) 

~67>(A) t6T 2Jf(™ +1 ). t (fe)^ t W i and iA , =0 for ajr, A , A'eA/~ 

since |e ^fc^ii ' 9(1 n+i^j = l. Observe the striking resemblance with (|4.11j) . Let ~ be an 
equivalence which does not contain any singleton. Then the following inequality holds 

# {t G T 2^(n+i) . £a# = 0, for all A' G ^4/ ~} < |2f *(«+i)-l>VH 

To see why this is true, observe as linear combinations of the tj and of to, we see 
that the expressions tj — tj^ : are all linearly independent, and hence the expressions 
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^(i k)eA(~ l) fc (*j ~~ are a ^ so linearly independent. Thus we have \Aj ~ | indepen- 
dent constraints in the above sum, and so the number of t's obeying the constraints is 
bounded \T\ 2n ~\ A l~\. 

With the notations of section 0J we established 

in 

m(n,K) < ^N k \T\ 2m - k P{2m,k) SU P ] [ F\A'\(r). (7.4) 
k=i ~-.\A/~\=k A , £Ah 

Now this is exactly the same as (|4.13|) which we proved obeys the desired bound. 
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